System and method for low-latency communication over unreliable networks

ABSTRACT

A method for low-latency communication from a first device to a second device over an unreliable network using at least one predictive machine learning model, characterized in that the method includes: representing at least one frame of time series data at the first device, wherein the at least one frame of time series data is a series of data points indexed in time order; recording at least one output stream, a metadata associated with the at least one output stream, and a plurality of external inputs from the first device in an interaction recorder of the second device, wherein the at least one output stream includes the at least one frame of time series data; segmenting a background area of an image into at least one background area stream, wherein the at least one background area stream is captured from a plurality of users; compressing at least one character centered portion of the image into a character focus stream for enabling an output image to be treated as two streams; training the at least one predictive machine learning model at the first device for predictive frame regeneration by providing the at least one output stream from the interaction recorder as an input; transmitting the results or interactions in time series to the second device, from the first device; detecting at least one lost frame of time series data using the at least one predictive machine learning model, at the second device; regenerating the at least one lost frame of the time series data at the second device using the at least one predictive machine learning model based on the at least one output stream to obtain at least one regenerated frame of time series data; and comparing an application stream from a stream of data obtained from the unreliable network with the at least one regenerated frame of time series data obtained from the at least one predictive machine learning model at the second device using a decision engine, wherein the application stream includes the at least one frame of time series data.

TECHNICAL FIELD

The present disclosure relates generally to a system and a method forlow-latency communication over an unreliable network using a predictivemachine learning model; moreover, the aforesaid system employs, when inoperation, machine learning techniques that regenerate lost data duringtransmission, for example by way of regenerating, time series data frompreviously received time series data.

BACKGROUND

Latency is a time interval between the stimulation and response, or,from a more general point of view, a time delay between the cause andthe effect of some physical change in the system being observed. Latencyis physically a consequence of the limited velocity with which anyphysical interaction can propagate. The magnitude of this velocity isalways less than or equal to the speed of light. Therefore, everyphysical system will experience some sort of latency, regardless of thenature of stimulation that it has been exposed to.

Low latency communication is typically performed over an unreliablenetwork channel. The Low latency communication mainly depends on thenetwork channel to enforce reliability that may lead to unpredictablelatency as the network channel may have uncontrollable retry bounds whendata is lost in transmission. This may cause message latency to haveunbounded characteristics. Interactive systems such as autonomousvehicles, robotics, multiplayer video gaming, virtual reality/augmentedreality, remote music jamming and telepresence system are mainlydependent on the Low latency communication for delivering control dataor data generated from interaction and to the system state. To keep thecommunication latency as low, these systems may use unreliable packettransport. One example of such unreliable transport is UnreliableDatagram Protocol (UDP). As a result, packet loss is inevitable becauseof the system conditions such as congestion, interference and thephysical conditions leading to bit errors in the transport medium.

Further, unreliable network protocols (e.g. UDP) do not attempt retriesin the presence of packet or data loss. The logics of a sender and areceiver may have to manage any detection and recovery of the lost data.Further, data retransmission of the lost data by the sender isundesirable as this carries a high latency cost, which is equivalent tothe latency in the channel.

US patent publication number US20120059783 discloses an authority overan artificial intelligence (AI) asset can be controlled among two ormore processing devices running a common program over a network using atechnique in which authority can be transferred. A first processingdevice can exercise authority over the AI asset by executing code thatcontrols one or more actions of the AI asset according to a decisiontree. The decision tree can determine whether to engage the programasset based on criteria other than a distance between the AI asset andthe program asset. The first processing device can broadcast a state ofthe AI asset to one or more other devices running the program. If thedecision tree determines that the AI asset should engage a program assetover which another processing device has authority the first processingdevice can relinquish authority over the AI asset and transfer authorityto the other device.

PCT publication number WO2009043066 discloses a method for enhancingwide-band speech audio signals in the presence of background noise and,more particularly to a noise suppression system, a noise suppressionmethod and a noise suppression program. More specifically, the presentinvention relates to low-latency single-channel noise reduction usingsub-band processing based on masking properties of the human auditorysystem.

US patent publication number US20170374164 discloses a method fortransmission and low-latency real-time output and/or processing of anaudio data stream that is transmitted from at least one transmitter toat least one receiver over a jittering transmission path. The methodincludes a calibration for determining a distribution of latencies inthe transmission of packets of the audio data stream, whereby a group ofpackets of the audio data stream is used as calibration packets andwherein a reference time grid and an offset of a fastest calibrationpacket are determined. Then, a shift of an output time grid for audiooutput and/or processing, based on the reference time grid and thedetermined offset of the fastest calibration packet, and the audiopackets of the audio data stream are provided according to the outputtime grid for audio output and/or processing.

PCT publication number WO2016030694 discloses a system for transmittinglow latency, synchronised audio that includes an audio source, aprocessor, a controller and a sink zone with a DAC. Particularly, theprocessor is capable of selectively resampling the audio source in orderto output a data packet for transmission to the sink zone that has amaximised payload size while packet frequency remains a whole number.However, none of the above prior art effectively detect the lostpacket/data during transmission and regenerate the lost data at areceiver while keeping latency low.

Therefore, in light of the foregoing discussion, there exists a need toovercome the aforementioned drawbacks in existing approaches forlow-latency communication from a first device to a second device overunreliable networks to regenerate lost data during transmission whilekeeping latency low.

SUMMARY

The present disclosure provides a method for low-latency communicationfrom a first device to a second device over an unreliable network usingat least one predictive machine learning model, characterized in thatthe method comprising:

representing at least one frame of time series data at the first device,wherein the time series data is a series of data points indexed in timeorder;recording at least one output stream, a metadata associated with the atleast one output stream, and a plurality of external inputs from thefirst device in an interaction recorder of the second device, whereinthe at least one output stream comprises the at least one frame of timeseries data;segmenting a background area of an image into at least one backgroundarea stream, wherein the at least one background area stream is capturedfrom a plurality of users;compressing at least one character centered portion of the image into acharacter focus stream for enabling an output image to be treated as twostreams;training the at least one predictive machine learning model at the firstdevice for a predictive frame regeneration by providing the at least oneoutput stream from the interaction recorder as an input;transmitting the results or interactions in a time series to the seconddevice, from the first device;detecting at least one lost frame of time series data using the at leastone predictive machine learning model, at the second device;regenerating the at least one lost frame of time series data at thesecond device using the at least one predictive machine learning modelbased on the at least one output stream to obtain at least oneregenerated frame of time series data; andcomparing an application stream from a stream of data obtained from theunreliable network with the at least one regenerated frame of timeseries data obtained from the at least one predictive machine learningmodel at the second device using a decision engine, wherein theapplication stream comprises the at least one frame of time series data.

It will be appreciated that the aforesaid present method is not merely a“method of doing a mental act′, but has a technical effect in that themethod functions as a form of technical control using machine learningor statistical analysis of a technical artificially intelligent system.The method involves regenerating at least one lost frame of the timeseries data to solve the technical problem of enabling the low-latencycommunication while recovering the lost data of the time series dataduring transmission.

The present disclosure also provides a first device that enableslow-latency communication with a second device over an unreliablenetwork using at least one predictive machine learning model,characterized in that the first device comprising: one or moreprocessors;

one or more non-transitory computer-readable mediums storing one or moresequences of instructions, which when executed by the one or moreprocessors, cause:representing at least one frame of time series data at the first device,wherein the at least one frame of time series data is a series of datapoints indexed in time order;recording at least one output stream, a metadata associated with the atleast one output stream, and a plurality of external inputs in aninteraction recorder of the second device, wherein the at least oneoutput stream comprises the at least one frame of time series data;segmenting a background area of an image into at least one backgroundarea stream, wherein the at least one background area stream is capturedfrom a plurality of users;compressing at least one character centered portion of the image into acharacter focus stream for enabling an output image to be treated as twostreams;training the at least one predictive machine learning model forpredictive frame regeneration by providing the at least one outputstream from the interaction recorder as an input; andtransmitting results or interactions in a time series to the seconddevice.

The present disclosure also provides a second device that enableslow-latency communication with a first device over an unreliable networkusing at least one predictive machine learning model, characterized inthat the second device comprising:

one or more processors;one or more non-transitory computer-readable mediums storing one or moresequences of instructions, which when executed by the one or moreprocessors, cause:receiving the results or interactions in the time series, from the firstdevice, wherein the results or interactions comprises a state spacerepresentation or the modified output stream of the at least one frameof time series data, wherein the state space representation comprisesinteractions between the first device and the second device;detecting at least one lost frame of time series data using the at leastone predictive machine learning model;regenerating the at least one lost frame of the time series data usingthe at least one predictive machine learning model based on the at leastone output stream to obtain at least one regenerated frame of timeseries data; andcomparing an application stream from a stream of data obtained from theunreliable network with the at least one regenerated frame of timeseries data obtained from the at least one predictive machine learningmodel using a decision engine, wherein the application stream comprisesthe at least one frame of time series data.

The present disclosure also provides a computer program productcomprising instructions to cause the first device and the second deviceto carry out the above described method.

Embodiments of the present disclosure substantially eliminate or atleast partially address the aforementioned drawbacks in existingapproaches for low-latency communication from a first device to a seconddevice over unreliable networks to regenerate lost data duringtransmission while keeping latency low.

Additional aspects, advantages, features and objects of the presentdisclosure are made apparent from the drawings and the detaileddescription of the illustrative embodiments construed in conjunctionwith the appended claims that follow.

It will be appreciated that features of the present disclosure aresusceptible to being combined in various combinations without departingfrom the scope of the present disclosure as defined by the appendedclaims.

BRIEF DESCRIPTION OF THE DRAWINGS

The summary above, as well as the following detailed description ofillustrative embodiments, is better understood when read in conjunctionwith the appended drawings. For the purpose of illustrating the presentdisclosure, exemplary constructions of the disclosure are shown in thedrawings. However, the present disclosure is not limited to specificmethods and instrumentalities disclosed herein. Moreover, those in theart will understand that the drawings are not to scale. Whereverpossible, like elements have been indicated by identical numbers.

Embodiments of the present disclosure will now be described, by way ofexample only, with reference to the following diagrams wherein:

FIG. 1 is a schematic illustration of a low-latency peer-to-peercommunication in accordance with an embodiment of the presentdisclosure;

FIG. 2 is a schematic illustration of a low-latency server to clientdevice communication in accordance with an embodiment of the presentdisclosure;

FIG. 3 is a schematic illustration of a cloud game interactive systemthat comprises a first device and a second device in accordance with anembodiment of the present disclosure;

FIG. 4 is a schematic illustration of the cloud game interactive systemthat comprises an interaction recorder in accordance with an embodimentof the present disclosure;

FIG. 5 is a schematic illustration of a multi-interactive applicationinstance system that comprises an interaction recorder in accordancewith an embodiment of the present disclosure;

FIG. 6 is a schematic illustration of a first device with theinteraction recorder in accordance with an embodiment of the presentdisclosure;

FIG. 7 is a schematic illustration of at least one predictive machinelearning model that is trained using at least one frame of time seriesdata from interaction recorder in accordance with an embodiment of thepresent disclosure;

FIG. 8 is a schematic illustration of a first device with an interactionrecorder in accordance with an embodiment of the present disclosure;

FIG. 9 is a schematic illustration of a second device that receiveslow-latency stream over an unreliable network in accordance with anembodiment of the present disclosure;

FIG. 10 is a schematic illustration of an architecture of a low-latencyaudio stream encoder system in accordance with an embodiment of thepresent disclosure;

FIG. 11 is a schematic illustration of an architecture of a low-latencyaudio stream decoder system in accordance with an embodiment of thepresent disclosure;

FIG. 12 is a schematic illustration of a predictive machine learningmodel training system in accordance with an embodiment of the presentdisclosure;

FIG. 13 is a schematic illustration of an encoder with a classifiermodel in accordance with an embodiment of the present disclosure;

FIG. 14 is a schematic illustration of a decoder with a frame generatorin model training engine accordance with an embodiment of the presentdisclosure;

FIG. 15 is a schematic illustration of a model selector and bundlingsystem of a second device or a first device in accordance with anembodiment of the present disclosure;

FIG. 16 is a schematic illustration of an adaptive model selectionsystem in accordance with an embodiment of the present disclosure;

FIG. 17 is a schematic illustration of a frame generator with a decoderin accordance with an embodiment of the present disclosure;

FIG. 18 is a schematic illustration of a frame classifier with a modelgenerator for an audio stream in accordance with an embodiment of thepresent disclosure;

FIG. 19 is a schematic illustration of a model regenerator with multiplestreams in accordance with an embodiment of the present disclosure;

FIG. 20 is a schematic illustration of a cloud mixer and a modelselector in accordance with an embodiment of the present disclosure;

FIG. 21 is a schematic illustration of a bit stream from frames of timeseries data in accordance with an embodiment of the present disclosure;

FIG. 22 is a schematic illustration of an encoder with a frameclassifier in accordance with an embodiment of the present disclosure;

FIG. 23 is a schematic illustration of a decoder with a frame generatorin accordance with an embodiment of the present disclosure;

FIGS. 24A-24C are flow diagrams illustrating a method for low-latencycommunication from a first device to a second device over unreliablenetworks using at least one predictive machine learning model accordingto an embodiment of the present disclosure; and

FIG. 25 is an illustration of an exploded view of a distributedcomputing system or cloud computing implementation in accordance with anembodiment of the present disclosure.

In the accompanying drawings, an underlined number is employed torepresent an item over which the underlined number is positioned or anitem to which the underlined number is adjacent. A non-underlined numberrelates to an item identified by a line linking the non-underlinednumber to the item. When a number is non-underlined and accompanied byan associated arrow, the non-underlined number is used to identify ageneral item at which the arrow is pointing.

DETAILED DESCRIPTION OF EMBODIMENTS

The following detailed description illustrates embodiments of thepresent disclosure and ways in which they can be implemented. Althoughsome modes of carrying out the present disclosure have been disclosed,those skilled in the art would recognize that other embodiments forcarrying out or practicing the present disclosure are also possible.

The present disclosure provides a method for low-latency communicationfrom a first device to a second device over an unreliable network usingat least one predictive machine learning model, characterized in thatthe method comprising:

representing at least one frame of time series data at the first device,wherein the time series data is a series of data points indexed in timeorder;recording at least one output stream, a metadata associated with the atleast one output stream, and a plurality of external inputs from thefirst device in an interaction recorder of the second device, whereinthe at least one output stream comprises the at least one frame of timeseries data;segmenting a background area of an image into at least one backgroundarea stream, wherein the at least one background area stream is capturedfrom a plurality of users;compressing at least one character centered portion of the image into acharacter focus stream for enabling an output image to be treated as twostreams;training the at least one predictive machine learning model at the firstdevice for a predictive frame regeneration by providing the at least oneoutput stream from the interaction recorder as an input;transmitting the results or interactions in a time series to the seconddevice, from the first device;detecting at least one lost frame of time series data using the at leastone predictive machine learning model, at the second device;regenerating the at least one lost frame of time series data at thesecond device using the at least one predictive machine learning modelbased on the at least one output stream to obtain at least oneregenerated frame of time series data; andcomparing an application stream from a stream of data obtained from theunreliable network with the at least one regenerated frame of timeseries data obtained from the at least one predictive machine learningmodel at the second device using a decision engine, wherein theapplication stream comprises the at least one frame of time series data.

The present method thus enables the second device to regenerate at leastone lost frame of the time series data using the at least one predictivemachine learning model. The present method thus allows the first deviceto record at least one output stream, a metadata associated with the atleast one output stream, and a plurality of external inputs in aninteraction recorder. Using the recorded data, the first device trainsthe at least one predictive machine learning model to regenerate themissing data. The present method considers the at least one frame oftime series data passed in the interaction recorder to be time series innature and sent as quanta in a packet that is called as frames. When aframe is lost in transmission, the second device detects the lost frameof the time series data, and by using the frames of time series datafrom previously received frames, the at least one predictive machinelearning model may generate the lost frame of the time series data. Theat least one predictive machine learning model may generate a confidencescore for the regenerated frame and is communicated back to the seconddevice. The confidence score may be used to trigger a need for sendingan updated predictive machine learning model. Such updated predictivemachine learning model may be transmitted out-of-band over a reliablechannel.

Additionally, when the predictive confidence score is low, the firstdevice may provide a new predictive machine learning model. The firstdevice continuously trains the new predictive machine learning modelbased on interactions observed between the first device and the seconddevice.

The first device may be a server or a cloud server. The second devicemay be a client device. Further, the first device may train multiplepredictive machine learning models concurrently based on differentcriteria such as the first device's computational capability and anumber of frames provided as input. The first device may adaptivelyselect a predictive machine learning model to use based on conditionsand resources available at the first device, such as computational powerand a number of frames cached. The regeneration of the lost frame of thetime series data using the at least one predictive machine learningmodel approach may enable low-latency as the second device does need toretry or delay the transmission of a packet to carry redundancyinformation such as Forward Error Correction Codes (FEC).

In an embodiment, the first device and the second device are part of apeer to peer system and the second device may be an autonomous vehicle,a robot, a multiplayer video gaming, a virtual reality (VR)/augmentedreality (AR) device, a remote music jamming or a telepresence system.

It will be appreciated that the aforesaid present method is not merely a“method of doing a mental act′, but has a technical effect in that themethod functions as a form of technical control using machine learningor statistical analysis of a technical artificially intelligent system.The method involves regenerating at least one lost frame of the timeseries data to solve the technical problem of enabling the low-latencycommunication while recovering the lost data of the time series dataduring transmission.

According to an embodiment, the method comprises combining an outputstream from the application stream with the at least one regeneratedframe of time series data at the second device to obtain a modifiedoutput stream.

According to an embodiment, the results or interactions in the timeseries comprises a state space representation or the modified outputstream of the at least one frame of time series data. The state spacerepresentation comprises interactions between the first device and thesecond device.

According to another embodiment, the training of the at least onepredictive machine learning model comprises generating a plurality ofpredictive machine learning models based on a number of frames in asequence and the second device computing capability.

According to yet another embodiment, the plurality of predictive machinelearning models comprises a stream source classification model. Thestream source classification model is selected by identifying the atleast one predictive machine learning model to be used when an input isnot tagged as a particular type.

According to yet another embodiment, the method comprises providing astate-space representation and the interaction between the first deviceand the second device as an input for training the at least onepredictive machine learning model and generating a plurality ofpredictive machine learning model based on the input.

According to yet another embodiment, the method comprises selecting asuitable predictive machine learning model for the predictive frameregeneration based on the second device's computing capability and aquality of the at least one regenerated frame of time series data.

According to yet another embodiment, the predictive frame regenerationcomprises the at least one background area stream and the characterfocus stream. Both the background area stream and the character focusstream may be used to train the predictive frame regeneration by feedingcontent from game plays that have been stored or in progress.

According to yet another embodiment, the at least one lost frame of thetime series data is detected using a frame loss indicator.

According to yet another embodiment, the at least one lost frame of thetime series data is detected using a frame loss indicator. The seconddevice may use the frame loss indicator to trigger the generation offill frame. The lost frame signal may be provided when an audio playoutqueue is empty or when the packet sequence number indicates that apacket was lost in transmission.

According to yet another embodiment, the method comprises detecting apacket lost in the at least one frame of time series data by a packetsequence number or by using a mean or a median an inter-arrival time.Using the inter-arrival time for detecting a packet lost may ensure thestability of the second device behavior and the low latency as effectsof jitter is filtered. Also, the second device may trigger theregeneration of the lost frame of the time series data with the at leastone predictive machine learning model by detecting a packet lost by thepacket sequence numbers or by using the mean or median inter-arrivaltime.

According to yet another embodiment, the method comprises calibrating anacoustic model with a decoder, wherein the acoustic model enables thedecoder to regenerate the at least one lost frame of the time seriesdata from lost data. The acoustic model may enable the decoder togenerate lost audio frames that are at best, and reduce the noiseeffects from the lost data. The acoustic model may not produce audiothat is authentic and specific (i.e. high fidelity) to the nature of anaudio stream that is happening. The second device may use techniqueslike Forward Error Encoding (FEC), where data from the previous packetis embedded in a subsequent packet. The data may be used by the decoderto regenerate the lost audio frame for the lost packet when thesubsequent packet arrives. The present method, however, introduceslatency, as the decoder may have to wait for the FEC packet to determinehow to proceed. If FEC is not used, then the decoder generates areplacement or fill frame when an audio playout side requests the nextframe.

The acoustic module may be static. As the packet loss rate increases,the ability of the acoustic model to produce good quality replacementframes diminishes rapidly and an audio quality desired by the user alsosignificantly diminishes. The acoustic model is not context or contentaware and therefore the acoustic model may not generate frames that arebest suited to content in the audio stream. The acoustic model isdefined as a model that is used in automatic speech recognition torepresent the relationship between an audio signal and the phonemes orother linguistic units that make up speech. The acoustic model islearned from a set of audio recordings and their correspondingtranscripts.

According to yet another embodiment, the method comprises producingfill-frames using a different number of input frames as an input vectorto the at least one predictive machine learning model to generate anoutput frame, wherein the output frame is of a different frame size. Theat least one predictive machine learning model may be trained withspecific stream content, for example, a saxophone, a drum, and the like.

According to yet another embodiment, the fill frames in a frame queueare replaced by actual frames that arrive later for improving anaccuracy of subsequent frames to be generated using real-time timeseries data.

According to yet another embodiment, the method comprises training theat least one predictive machine learning model with specific streamcontent. The specific audio contents may be a saxophone, a drum and thelike.

According to yet another embodiment, the method comprises associating abundle model with the specific stream content, different input framesizes, and output frames into one package based on the second device'scomputing capability.

In an embodiment, the at least one predictive machine learning modelincludes a bundle model.

In another embodiment, a plurality of bundle models may be created by asecond device in one package called an ensemble. Each bundle model inthe ensemble is associated with a different classes of time series datastream, and each bundle model is trained to accept a range of inputframe sizes and output frame sizes. At the frame generation, a subset ofmodels selected from the package is initially used to generate fillframe and the package is selected based on the second device's computingcapability such as CPU power, a machine learning computation engine, andthe like. Each bundle model generates both an audio data and aconfidence score of the quality of audio data being produced. A finalaudio output used is based on the confidence score of the bundle modelabout the generated audio data. If the second device has limitedcomputing capability, a smaller set of bundle models or less complexbundle models are selected for use. Once a bundle model generates frameswith a high confidence score beyond a specified confidence threshold,that bundle model is reused in a subsequent fill-frame generation.Periodic reset of the selected bundle model is performed so that thesecond device may retest the package models to analyze whether a betterfitting model can be found.

The second device may work with only compute latency and no bufferinglatency as the fill frame generation has no buffering. The frame isgenerated computationally. As long as the at least one predictivemachine learning model may complete its computation in the time for theplayout queue to consume the generated frame, no latency lag isincurred. By dynamically trimming the at least one predictive machinelearning model selection to match the computational capability of thesecond device that generates the frame, the second device may ensurethat frame generation is guaranteed to complete within the time.

According to yet another embodiment, the method comprises generating aconfidence score of a quality of the at least one regenerated frame oftime series data regenerated by the at least one predictive machinelearning model, wherein the at least one regenerated frame of timeseries data with high confidence score beyond a specified confidencethreshold is reused in a subsequent fill-frame generation. The finaloutput is based on the confidence score of a quality of the regeneratedframe of time series data regenerated by the at least one predictivemachine learning model. If the second device has limited computingcapability, a smaller set of predictive machine learning models or lesscomplex predictive machine learning models may be selected for use. Aperiodic reset of the selected predictive machine learning model may beperformed so that the second device may retest the package models todetermine whether a better model may be available or not.

In an embodiment, when the data sent between communicating parties (e.g.the first device and the second device) carries the time-series data,and where the signal has a structure such that the at least onepredictive machine learning model may be trained to predict andregenerate data lost in transmission, this present method may producethe lowest latency between the communicating parties.

Some example applications or systems that can be impacted by thispresent method are provided as follows:

In the Low latency music and audio transmission system, instruments andmusic are generalized such that the at least one predictive machinelearning models may be trained and the second device may predict orgenerate the next frame of audio once the Low latency music and audiotransmission system has received an earlier frame of the audio.

In autonomous and remote vehicle control system, for example, in aclosed control loop system with fast-moving vehicles, a position, andcontrol information is communicated with low-latency. With the at leastone predictive machine learning model, the vehicle control and data maybe communicated as the time series data among vehicles, or to a networkcontroller at a central or edge node. Both the vehicle and thecontroller may need to regenerate data lost in transmission over anunreliable network. An online system that learns and refines the atleast one predictive machine learning model from the time seriescontrol/data passed may allow the autonomous and remote vehicle controlsystem to use an unreliable channel/network for communication andrealize low-latency communication.

High-frequency trading system may depend on low-latency. Thehigh-frequency trading systems may attempt to address a need for thelow-latency by being placed geographically close to the trading centerdata feed while using a reliable transport. Stock data is fundamentallytime-series and the at least one predictive machine learning model toregenerate the lost data may be used to extend a physical distance fromthe trading data feed.

In Virtual Reality (VR) and Augmented Reality (AR) system, the VR/ARsystem may depend on low-latency to ensure that consumers have anoptimal experience. When the AR/VR system may involve data (e.g. timeseries data) from a source remote to a headset/wearable of the AR/VRsystem, the data may be communicated with low-latency. Most consumerhome environments are so noisy as the communication channel/network isunreliable. Additionally, most data transmitted for VR/AR may be anaudio, a video, a position, sensory and may be communicated in a timeseries format. The at least one predictive machine learning model may betrained to regenerate the lost data while preserving the low-latencyproperty.

In low Latency interactive video system, video such as occurred incloud-video gaming, and the video game is controlled by usersinteracting remotely to the interactive video system. In an embodiment,the first device may execute the video game and produce the video thatis streamed to the users. The interactive video system may generategameplay audio that is streamed to the users. The gameplay audio and thevideo may be streamed at low-latency over an unreliable network. Auser's device (i.e. the second device) may receive the gameplay audioand the video, decode and plays the gameplay audio and the video on theuser's device. Simultaneously, the user may react to the gameplay audioor the video content using an input device, such as a game controller, akeyboard, a mouse, a VR headset, motion sensors, etc., which maycommunicate game control signals at low-latency to the first device.Both directions of communication (i.e. from the first device to the userdevice and from the user device to the first device) requirelow-latency, which is often over an unreliable network.

The present disclosure also provides a first device that enableslow-latency communication with a second device over an unreliablenetwork using at least one predictive machine learning model,characterized in that the first device comprising: one or moreprocessors;

one or more non-transitory computer-readable mediums storing one or moresequences of instructions, which when executed by the one or moreprocessors, cause:representing at least one frame of time series data at the first device,wherein the at least one frame of time series data is a series of datapoints indexed in time order;recording at least one output stream, a metadata associated with the atleast one output stream, and a plurality of external inputs in aninteraction recorder of the second device, wherein the at least oneoutput stream comprises the at least one frame of time series data;segmenting a background area of an image into at least one backgroundarea stream, wherein the at least one background area stream is capturedfrom a plurality of users;compressing at least one character centered portion of the image into acharacter focus stream for enabling an output image to be treated as twostreams;training the at least one predictive machine learning model forpredictive frame regeneration by providing the at least one outputstream from the interaction recorder as an input; and

transmitting results or interactions in a time series to the seconddevice.

The advantages of the present first device are thus identical to thosedisclosed above in connection with the present method and theembodiments listed above in connection with the present method applymutatis mutandis to the present first device.

In an example embodiment, a video output of a game that is rendered inthe first device may be predicted as the game is finite and defines orlimits what is to be generated and the users may often follow familiaror rail tracks, and along these paths, the rendering has large portionsof scenes that are of the same view. The large portions of the videoimage may be predicted using the at least one predictive machinelearning model, with high confidence. These video images may ofteninclude background textures over which gameplay is layered. The parts ofthe screen that occupy characters are usually smaller. The video imageis segmented into the background area stream and compressed separatelyfrom the character centered portions that enable the output video imageto be treated logically as two video streams. The background area streammay be compressed and streamed with low-latency supported by thepredictive frame regeneration. The at least one predictive machinelearning model may be trained using the background area streams that arecaptured from a large number of users.

Both the background area stream and the character focus stream may beused to train the predictive frame regeneration by feeding content froma gameplay that has been stored or in progress. In an embodiment, thestatistical or predictive machine learning model is calibrated, trainedor optimised using at least one of historical atmospheric contaminantdata, live atmospheric contaminant data or simulations of theatmospheric contaminant risk.

In an embodiment, the predictive frame regeneration comprises the atleast one background area stream and the character focus stream. Boththe background area stream and the character focus stream may be used totrain the predictive frame regeneration by feeding content from gameplays that have been stored or in progress.

According to an embodiment, the one or more processors is furtherconfigured to train the at least one predictive machine learning modelwith specific stream content. The specific stream content may be asaxophone, a drum and the like.

The present disclosure also provides a second device that enableslow-latency communication with a first device over an unreliable networkusing at least one predictive machine learning model, characterized inthat the second device comprising:

one or more processors;one or more non-transitory computer-readable mediums storing one or moresequences of instructions, which when executed by the one or moreprocessors, cause:receiving the results or interactions in the time series, from the firstdevice, wherein the results or interactions comprises a state spacerepresentation or the modified output stream of the at least one frameof time series data, wherein the state space representation comprisesinteractions between the first device and the second device;detecting at least one lost frame of time series data using the at leastone predictive machine learning model;regenerating the at least one lost frame of the time series data usingthe at least one predictive machine learning model based on the at leastone output stream to obtain at least one regenerated frame of timeseries data; andcomparing an application stream from a stream of data obtained from theunreliable network with the at least one regenerated frame of timeseries data obtained from the at least one predictive machine learningmodel using a decision engine, wherein the application stream comprisesthe at least one frame of time series data.

The advantages of the present second device are thus identical to thosedisclosed above in connection with the present method and theembodiments listed above in connection with the present method applymutatis mutandis to the present second device.

The second device may detect the lost frame of the time series datausing a frame of time series data from previously received frames. Theat least one predictive machine learning model may generate the lostframe of the time series data. The at least one predictive machinelearning model may generate a confidence score for the regenerated frameand is communicated back to the second device. The at least onepredictive machine learning model-based approach for regeneration oflost frame of time series data may enable low-latency as the seconddevice does need to retry or delay the transmission of a packet to carryredundancy information such as Forward Error Correction Codes (FEC).

According to an embodiment, the one or more processors is furtherconfigured to combine an output stream from the application stream withthe at least one regenerated frame of time series data to obtain amodified output stream.

According to another embodiment, the at least one lost frame of the timeseries data is detected using a frame loss indicator.

According to yet another embodiment, the one or more processors isfurther configured to calibrate an acoustic model into a decoder,wherein the acoustic model enables the decoder to regenerate the atleast one lost frame of the time series data from lost data. In anexample embodiment, in Low latency music and audio transmission system,the acoustic model may enable the decoder to generate lost audio framesthat are at best, and reduce the noise effects from the lost data. Theacoustic model may not produce audio that is authentic and specific(i.e. high fidelity) to the nature of an audio stream that is happening.The second device may use techniques like Forward Error Encoding (FEC),where data from the previous packet is embedded in a subsequent packet.The data may be used by the decoder to regenerate the lost audio framefor the lost packet when the subsequent packet arrives. The presentmethod, however, introduces latency, as the decoder may have to wait forthe FEC packet to determine how to proceed. If FEC is not used, then thedecoder generates a replacement or fill frame when an audio playout siderequests the next frame.

The acoustic module may be static. As the packet loss rate increases,the ability of the acoustic model to produce good quality replacementframes diminishes rapidly and an audio quality desired by the user alsosignificantly diminishes. The acoustic model is not context or contentaware and therefore the acoustic model may not generate frames that arebest suited to content in the audio stream. The acoustic model isdefined as a model that is used in automatic speech recognition torepresent the relationship between an audio signal and the phonemes orother linguistic units that make up speech. The acoustic model islearned from a set of audio recordings and their correspondingtranscripts.

According to yet another embodiment, the one or more processors isfurther configured to produce fill-frames using a different number ofinput frames as an input vector to the at least one predictive machinelearning model to generate an output frame, wherein the output frame isof a different frame size. The at least one predictive machine learningmodel may be trained with specific stream content, for example, asaxophone, a drum, and the like.

According to yet another embodiment, the one or more processors isfurther configured to associate a bundle model with the specific streamcontent, different input frame sizes, and output frames into one packagebased on its computing capability.

In an embodiment, the at least one predictive machine learning modelincludes a bundle model.

In another embodiment, the second device creates a plurality of bundlemodels in one package called an ensemble. Each bundle model in theensemble is associated with a different classes of time series datastream, and each bundle model is trained to accept a range of inputframe sizes and output frame sizes. At the frame generation, a subset ofmodels selected from the package is initially used to generate fillframe and the package is selected based on the second device's computingcapability such as CPU power, a machine learning computation engine, andthe like. Each bundle model generates both an audio data and aconfidence score of the quality of audio data being produced. A finalaudio output used is based on the confidence score of the bundle modelabout the generated audio data. If the second device has limitedcomputing capability, a smaller set of bundle models or less complexbundle models are selected for use. Once a bundle model generates frameswith a high confidence score beyond a specified confidence threshold,that bundle model is reused in a subsequent fill-frame generation.Periodic reset of the selected bundle model is performed so that thesecond device may retest the package models to analyze whether a betterfitting model can be found.

The second device may work with only compute latency and no bufferinglatency as the fill frame generation has no buffering. The frame isgenerated computationally. As long as the at least one predictivemachine learning model may complete its computation in the time for theplayout queue to consume the generated frame, no latency lag isincurred. By dynamically trimming the at least one predictive machinelearning model selection to match the computational capability of thesecond device that generates the frame, the second device may ensurethat frame generation is guaranteed to complete within the time. In anembodiment, the second device further comprises training the at leastone predictive machine learning model with specific stream content. Thespecific stream content may be a saxophone, a drum or the like.

According to an embodiment, the one or more processors is furtherconfigured to generate a confidence score of a quality of the at leastone regenerated frame of time series data regenerated by the at leastone predictive machine learning model, wherein the at least oneregenerated frame of time series data with high confidence score beyonda specified confidence threshold is reused in a subsequent fill-framegeneration.

The present disclosure also provides a computer program productcomprising instructions to cause the first device and the second deviceto carry out the above described method.

The advantages of the present computer program product are thusidentical to those disclosed above in connection with the present methodand the embodiments listed above in connection with the present methodapply mutatis mutandis to the computer program product.

Embodiments of the present disclosure may enable the second device toregenerate at least one lost frame of the time series data based on theat least one output stream using the at least one predictive machinelearning model. Embodiments of the present disclosure may thus allow thefirst device to record at least one output stream, a metadata associatedwith at least one output stream, and a plurality of external inputs fromthe first device in an interaction recorder. Using the recorded data,the first device trains the at least one predictive machine learningmodel to regenerate the missing data. Embodiments of the presentdisclosure may consider the at least one frame of time series datapassed in the interaction recorder to be time series in nature and sendas quanta in a packet that is called as frames. When a frame is lost intransmission, embodiments of the present disclosure may enable thesecond device to detect the lost frame of the time series data, and byusing the frames of time series data from previously received frames,the at least one predictive machine learning model may generate the lostframe of the time series data.

DETAILED DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic illustration of a low-latency peer-to-peercommunication in accordance with an embodiment of the presentdisclosure. The low-latency peer-to-peer communication comprises a firstdevice 102, an internet or a network 104, and a second device 106. Theinternet or the network 104 may be an unreliable network channel. Thefunction of the above parts as has been described above.

FIG. 2 is a schematic illustration of a low-latency server to clientdevice communication in accordance with an embodiment of the presentdisclosure. The low-latency device to device communication comprises aserver 202 (e.g. a first device), an internet or a network 204, and aclient device 206 (e.g. a second device). The internet or the network204 may be an unreliable network channel. The function of the aboveparts as has been described above.

FIG. 3 is a schematic illustration of a cloud game interactive systemthat comprises a first device and a second device in accordance with anembodiment of the present disclosure. The first device 302 comprises agame execution system 306 that executes a game, a video output 308 thatoutputs a video of the game, an audio output 310 that outputs an audioof the game, a first game input control 312 that control the game basedon control information received from the second device 304, a videoencoder and streamer 314 that encodes the video of the game forstreaming it, and an audio encoder and streamer 316 that encodes theaudio of the game for streaming it. The second device 304 comprises avideo stream decoder 320 that decodes the video stream of the game, anaudio stream decoder 322 that decodes the audio stream of the game, agame input controls 324 that provides the control information to controlthe game, a video output and display 326, and an audio output 328. Thefirst device 302 is communicated with the second device 304 over anunreliable network 318. A user 330 may interact with the second device304 to provide control information to the first device 302. The functionof the other parts as has been described above.

FIG. 4 is a schematic illustration of the cloud game interactive systemthat comprises an interaction recorder in accordance with an embodimentof the present disclosure. The cloud game interactive system 402comprises an interaction execution system 404, at least one predictivemachine learning model 406, one or more output streams 408A-408N, astate space representation 410, an input stream 412, an interactionrecorder 414, metadata 416, an internet or a network 418, a bundle ofstreams 420, a recorded state space representation 422 and an externalinput 424. The interaction execution system 404 executes the inputstream 412 with at least one predictive machine learning model 406 andoutputs the one or more output streams 408A-408N and the state spacerepresentation 410. The one or more output streams 408A-408N comprises aframe of time series data. The interaction recorder 414 records the oneor more output streams 408A-408N, the state space representation 410,the input stream 412, and the metadata 416 and outputs the bundle ofstreams 420 and the bundle of streams 420 may include one or more frameof time series data. The one or more frame of time series data is storedin the recorded state space representation 422. The function of theother parts as has been described above.

FIG. 5 is a schematic illustration of a multi-interactive applicationinstance system that comprises an interaction recorder in accordancewith an embodiment of the present disclosure. The multi-interactiveapplication instance system comprises one or more external input502A-502N, one or more application with interaction recorder 504A-504N,one or more bundle of streams 506A-506C, and a networkconnected/recorded state space representation 508. The multi-interactiveapplication instance system may learn user response by analysing arendered output video of a first device, and, or an instantaneousapplication state, as input to at least one predictive machine learningmodel. The interactive application state-variables represents a uniqueoutput that renders at least one frame of time series data (e.g. theaudio and video) being generated. The state variables may describe thestate space representation of the closed-loop control defined by theuser's interaction with the application that drives the creation of theaudio and video. The state space representation may be smaller than theat least one frame of time series data and therefore represents a moreefficient form of data for training the at least one predictive machinelearning model. The state space representation may also be sent to asecond device. With the state of the game controller as input, alongwith the application state, the first device may speculatively generateat least one lost frame of the time series data at the second device.The speculative generation of at least one lost frame of the time seriesdata may create even a lower-latency interaction as the frames areproduced at the second device. Subsequent states or frames received fromthe first device may be used to adjust the speculative generator basedon correctness of its prediction. The function of the other parts as hasbeen described above.

FIG. 6 is a schematic illustration of a first device 602 with theinteraction recorder in accordance with an embodiment of the presentdisclosure. The first device 602 comprises a game execution system 604,a game input controls 606, a video output 608, an audio output 610, avideo encoder and streamer 612, an audio encoder and streamer 614, anexternal input 616, a state space representation 618, an interactionrecorder 620, an internet or a network 622, a bundle of streams 624, anetwork connected recorded state space representation store 626, andmetadata 628. The game input controls 606 obtains the external input 616from the low-latency communication from the first device 602 to a seconddevice over an unreliable network. The game input controls 606 detects alost frame of the time series data and regenerates the lost frame of thetime series data using at least one predictive machine learning model.The game execution system 604 executes the regenerated lost frame oftime series data and outputs the video output 608, the audio output 610,and the state space representation 618. The video output 608 and theaudio output 610 are encoded in the video encoder and streamer 612 andthe audio encoder and streamer 614. The interaction recorder 620 recordsthe encoded video, the encoded audio, the metadata 628 and the statespace representation 618 then outputs the bundle of streams 624. Thebundle of streams 624 is stored in the network connected/recorded statespace representation store 626. The function of the other parts as hasbeen described above.

FIG. 7 is a schematic illustration of at least one predictive machinelearning model that is trained using at least one frame of time seriesdata from interaction recorder in accordance with an embodiment of thepresent disclosure. The at least one predictive machine learning modelobtains input from a network connected/recorded state spacerepresentation store 702. Then the state space presentation 704 is usedto train the at least one predictive machine learning model 706 toobtain a trained predictive machine learning model 708 to regenerate alost frame of the time series data. The function of the other parts ashas been described above.

FIG. 8 is a schematic illustration of a first device 802 with aninteraction recorder 822 in accordance with an embodiment of the presentdisclosure. The first device 802 comprises a game execution system 804,an input loss detection and regeneration model 806 (e.g. at least onepredictive machine learning model), a video output 808, an audio output810, a video encoder and streamer 812, an audio encoder and streamer814, an input decoder 816, an external input 818, a state spacerepresentation 820, the interaction recorder 822, metadata 824, a bundleof streams 826, and a state space representation store 828. The inputdecoder 816 decodes the external input 818. The input loss detection andregeneration model 806 obtains a decoded external input from the inputdecoder 816. The input loss detection and regeneration model 806 detectsa lost frame of the time series data and regenerates the lost frame ofthe time series data. The game execution system 804 executes theregenerated lost frame of the time series data and outputs the videooutput 808, the audio output 810, and the state space representation820. The video output 808 and the audio output 810 are encoded in thevideo encoder and streamer 812 and the audio encoder and streamer 814respectively. The interaction recorder 822 records the encoded video,the encoded audio, the metadata 824 and the state space representation820 then outputs the bundle of streams 826. The bundle of streams 826 isstored in the state space representation store 828. The function of theother parts as has been described above.

FIG. 9 is a schematic illustration of a second device that receiveslow-latency stream over an unreliable network in accordance with anembodiment of the present disclosure. The second device 904 receiveslow-latency stream over an unreliable network 906 from a first device902, an output stream 903, a first input 905, the internet or theunreliable network 906, a input block 908, a second input 910, a stream912 from the unreliable network 906, at least one predictive machinelearning model 914, an application stream 916, a generate stream 918, adecision engine 920, an output stream 922, and an output 924. The firstdevice 902 inputs the output stream 903 and the first input 905 to theinternet or the unreliable network 906. The internet or the unreliablenetwork 906 obtains input from the input block 908. The input from theinput block 908 is obtained from the second input 910. The internet orthe unreliable network 906 outputs an application stream and theregenerated lost frame of the time series data to the stream 912 fromthe unreliable network 906 and the at least one predictive machinelearning model 914. The application stream comprises at least one frameof the time series data. The output stream 922 from the applicationstream and the regenerated lost frame of the time series data iscompared using the decision engine 920.

FIG. 10 is a schematic illustration of an architecture of a low-latencyaudio stream encoder system in accordance with an embodiment of thepresent disclosure. The architecture of the low-latency audio streamencoder system comprises an audio 1002, a digitizer 1004, a framer 1006,an encoder 1008, a bitstream 1010, a dataloss model 1012, a packetizer1014, an error encoder (FEC) 1016, a network transmitter 1018, and aninternet or a network 1020. The audio 1002 is digitized in the digitizer1004. The audio 1002 is encoded using the error encoder (FEC) 1016 andtransmitted to the internet or the network 1020 through the networktransmitter 1018. The function of these parts as has been known in theart.

FIG. 11 is a schematic illustration of an architecture of a low-latencyaudio stream decoder system in accordance with an embodiment of thepresent disclosure. The architecture of the low-latency audio streamdecoder system using an audio stream decoder system comprises aninternet or a network 1102, a network receiver 1104, a packet jitterbuffer 1106, a depacketizer 1108, a decoder and a framer 1110, a packetloss indicator 1112, an audio frame 1114, a valid frame selector 1116, adecoded frames 1118, a playout audio frame 1120, a speaker 1122, a framecache 1124, and an acoustic model 1126. The network receiver 1104receives a state space representation from the internet or theunreliable network 1102. The acoustic model 1126 may be calibrated withthe decoder and the framer 1110 and the acoustic model 1126 is static.The acoustic model 1126 may enable the decoder and the framer 1110 toregenerate at least one frame of time series data from a lost data. Thefunction of these parts as has been known in the art.

FIG. 12 is a schematic illustration of a predictive machine learningmodel training system in accordance with an embodiment of the presentdisclosure. The predictive machine learning model training systemcomprises an audio file 1202, a file metadata 1204, read audio frames1206, a plurality of frames 1208A-1208G, a neural network model trainer1210, a classifier model 1212, and a generator model 1214. The audiofile 1202 is inputted to the file metadata 1204, and the read audioframes 1206. The read audio frames 1206 divides the audio file 1202 intothe plurality of frames 1208A-1208G. The generator model 1214 is thentrained using the neural network model trainer 1210 and the generatormodel 1214 generates an updated predictive machine learning model toregenerate at least one frame of time series data.

FIG. 13 is a schematic illustration of an encoder with a classifiermodel in accordance with an embodiment of the present disclosure. Theencoder comprises an audio 1302, a digitizer 1304, a frame generatormodel training engine 1306, read and updates 1308, a push updated model1310, a model store 1312, an out-band model 1314, a frame generatorselector 1316, an in-band model 1318, a framer 1320, an encoder 1322, abitstream 1324, a packetizer 1326, a network transmitter 1328, and aninternet or a network 1330. Previously received frame of time seriesdata are digitized and framed by the digitizer 1304 and the framer 1320.The model store 1312 comprises at least one predictive machine learningmodel. The frame generator model training engine 1306 trains the atleast one predictive machine learning model based on the previouslyreceived frame of time series data. The frame generator selector 1316selects a suitable predictive machine learning model for regenerating atleast one lost frame of the time series data based on at least oneoutput stream. The function of other parts as has been known in the art.

FIG. 14 is a schematic illustration of a decoder with a frame generatormodel training engine in accordance with an embodiment of the presentdisclosure. The frame generator model training engine includes aninternet or a network 1402, a network receiver 1404, a packet jitterbuffer 1406, an out of band model bundle ID 1408, a depacketizer 1410, apacket loss indicator 1412, a decoder and framer 1414, a decoded frame1416, an insert generated frame 1418, an audio frame 1420, a valid frameselector 1422, a playout audio frame 1424, a speaker 1426, a frame cache1428, an in-band model ID 1430, a frame model ID 1432, a frame generatorand model selector 1434, a frame generator model cache or library 1436,a frame generator model 1438, and a packet loss indicator 1440. A set ofpredictive models that are generated may be used at a second device fordetermining a minimal set of predictive models used to generate fillframes for a stream by (i) finding a predictive model that produce alost frame of time series data with high confidence, (ii) using theminimum amount of computation resource, and (iii) completing the framegeneration within a stipulated/allotted time. The function of otherparts as has been known in the art.

FIG. 15 is a schematic illustration of a model selector and bundlingsystem of a second device or a first device in accordance with anembodiment of the present disclosure. The model selector and bundlingsystem of the second device or the first device includes an audio 1502,a digitizer 1504, a framer 1506, a bar length down sample 1508, aninternet or a network 1510, a frame generator model training engine1512, a frame generator and a model selector 1514 that is ranked byprobability fit, a push updated model 1516, an in-band model selectorinformation 1518, a model library 1520, a model bundle store (such asvoice ID, best fit models) 1522, and a read and update 1524. When a lostframe of time series data is generated with a low-confidence score, thesecond device may consider the broadest set of predictive machinelearning models for predictive frame regeneration. The function of otherparts as has been known in the art.

FIG. 16 is a schematic illustration of an adaptive model selectionsystem in accordance with an embodiment of the present disclosure. Theadaptive model selection system includes a frame cache 1602, a modelenabler 1604, a plurality of audio frame generator models 1604A-1604N,an adaptive generator pruning 1606, a generated audio 1608A-1608N, ascore 1610A-1610N, a voter 1612, a model selector 1614, and an audioframe 1616. A second device may communicatively connect with a firstdevice to determine a better bundle model for a frame of time seriesdata. The second device pushes a decoded frame of time series data tothe first device. The first device may comprise the adaptive modelselection system that uses the frame of time series data to (i) classifyan audio, (ii) select a best set of predictive model package based onthe second device computation capability to obtain bundle models and(iii) send the bundle models to the second device. The second device mayuse a new bundle model to generate fill frames for that frame of timeseries data. The function of other parts as has been known in the art.

FIG. 17 is a schematic illustration of a frame generator with a decoderin accordance with an embodiment of the present disclosure. The framegenerator with the decoder includes an internet or a network 1702, anetwork packets 1704, a decoded frame 1706, a packet loss indicator1708, a generated frame 1710, a model selector 1712, and a playout queue1714. A second device may comprise the decoder that pushes a decodedframe of time series data to a first device (e.g. a cloud server). Thefirst device uses the frame of time series data to (i) classify anaudio, (ii) select a best set of predictive model package based on thesecond device computation capability to obtain bundle models and (iii)send the bundle models to the second device. The function of other partsas has been known in the art.

FIG. 18 is a schematic illustration of a frame classifier with a modelgenerator for an audio stream in accordance with an embodiment of thepresent disclosure. The frame classifier with the model generatorincludes an internet or a network 1802, one or more peer packet stream1804A-1804C, one or more packet loss indicator 1806A-1806C, one or moredecoded frame 1808A-1808C, one or more generated frame 1810A-1810C, oneor more model selector 1812A-1812C, one or more playout queue1814A-1814C, an audio mixer 1816, and a playout 1818. The frameclassifier and the model generator may provide continuous improvement inaudio quality as at least one predictive machine learning model of afirst device is periodically re-trained and a second device may receivethose improved predictive machine learning models on each model updatepull requests. The function of other parts as has been known in the art.

FIG. 19 is a schematic illustration of a model regenerator with multiplestreams in accordance with an embodiment of the present disclosure. Themodel regenerator includes an internet or a network 1902, one or morepeer packet stream 1904A-1904C, one or more audio decoder 1906A-1906C,one or more packet loss indicator 1908A-1908C, an out of band cloudframe generator 1910, a packet loss indicator merger 1912, an audiomixer 1914, a frame cache 1916, a stream classifier and a modelidentifier 1918, a frame generator and a model selector 1920, a framegenerator model cache or library 1922, a frame generator model 1924, avalid frame selector 1926, a playout frame 1928, and a speaker 1930. Themodel regenerator may regenerate a lost frame of time series data usingone or more peer packet stream 1904A-1904C. The function of other partsas has been known in the art.

FIG. 20 is a schematic illustration of a cloud mixer and a modelselector in accordance with an embodiment of the present disclosure. Thecloud mixer and the model selector include an internet or a network2002, one or more peer packet stream 2004A-2004C, one or more audiodecoder 2006A-2006C, an audio mixer 2008, a frame cache 2010, a streamclassifier and a model identifier 2012, a frame generator and a modelselector 2014, a frame trainer and update model cache or library 2016,and a generated stream 2018 with bundle information. The cloud mixer andthe model selector may select a suitable predictive machine learningmodel to regenerate lost frame of time series data. The function ofother parts as has been known in the art.

FIG. 21 is a schematic illustration of a bit stream from frames of timeseries data in accordance with an embodiment of the present disclosure.

The schematic illustration illustrates the various dynamic size of thebitstream used for predictive frame generation. The schematicillustration illustrates a different size input vector that is used byat least one predictive machine learning model. The frames 2102A-2102Gare bit streamed using bitstreams 2104A-2104F.

FIG. 22 is a schematic illustration of an encoder with a frameclassifier in accordance with an embodiment of the present disclosure.The encoder with the frame classifier comprises an input data 2202, aframer 2204, an encoder 2206, a frame generator model training engine2208, a frame generator and model selector 2210, a bit streamer 2212, amodel store 2214, a packetizer 2216, a network transmitter 2218, aninternet or a network 2220, an out-band model 2222, in-band modelselector 2224, a push updated model 2226, read and update model 2228 andan online/dynamic frame generator model training system 2230. The framer2204 and the encoder 2206 frames and encodes a previously received frameof time series data. The model store 2214 comprises at least onepredictive machine learning model. The frame generator model trainingengine 2208 is trained based on the previously received frame of thetime series data. The frame generator and model selector 2210 may selecta suitable predictive machine learning model for regenerating a lostframe of the time series data from a state space representationassociated with frames of the time series data. The function of otherparts as has been known in the art.

FIG. 23 is a schematic illustration of a decoder with a frame generatorin accordance with an embodiment of the present disclosure. The decoderwith the frame generator includes an internet or a network 2302, anetwork receiver 2304, an out of band model bundle ID 2306, a packetjitter buffer 2308, a depacketizer 2310, an in-band information 2312, aframe model ID 2314, a packet loss indicator 2316, a decoder and framer2318, a decoded frame 2320, a frame cache 2322, a frame generator andmodel selector 2324, a frame generator and model cache or library 2326,a frame generator model 2328, an insert generated frame 2330, an audioframe 2332, a valid frame selector 2334, and an output stream 2336. Aset of predictive models that are generated may be used at a seconddevice for determining a minimal set of predictive models used togenerate fill frames for a stream by (i) finding a predictive model thatproduce a lost frame of time series data with high confidence, (ii)using the minimum amount of computation resource, and (iii) completingthe frame generation within a stipulated/allotted time. The function ofother parts as has been known in the art.

FIGS. 24A-24C are flow diagrams illustrating a method for low-latencycommunication from a first device to a second device over unreliablenetworks using at least one predictive machine learning model accordingto an embodiment of the present disclosure. At a step 2402, at least oneframe of time series data is represented at the first device. The timeseries data is a series of data points indexed in time order. At a step2404, at least one output stream, a metadata associated with the atleast one output stream, and a plurality of external inputs from thefirst device is recorded in an interaction recorder of the seconddevice. The at least one output stream comprises the at least one frameof time series data. At a step 2406, a background area of an image issegmented into at least one background area stream and the at least onebackground area stream is captured from a plurality of users. At a step2408, at least one character centered portion of the image is compressedinto a character focus stream for enabling an output image to be treatedas two streams. At a step 2410, the at least one predictive machinelearning model at the first device is trained with a predictive frameregeneration by providing the at least one output stream from theinteraction recorder as input. At a step 2412, the results orinteractions are transmitted in a time series to the second device fromthe first device. At a step 2414, at least one lost frame of time seriesdata is detected using the at least one predictive machine learningmodel, at the second device. At a step 2416, the at least one lost frameof the time series data is regenerated at the second device using the atleast one predictive machine learning model based on the at least oneoutput stream to obtain at least one regenerated frame of time seriesdata. At a step 2418, an application stream from a stream of dataobtained from the unreliable networks is compared with the at least oneregenerated frame of time series data obtained from the at least onepredictive machine learning model at the second device using a decisionengine. The application stream comprising at least one frame of timeseries data.

FIG. 25 is an illustration of an exploded view of a distributedcomputing system or cloud computing implementation in accordance with anembodiment of the present disclosure. The exploded view comprises aninput interface 2502, a control module that comprises a processor 2504,a memory 2506 and a non-volatile storage 2508, processing instructions2510, a shared/distributed storage 2512, a server that comprises aserver processor 2514, a server memory 2516 and a server non-volatilestorage 2518 and an output interface 2520. The function of the serverprocessor 2514, the server memory 2516 and the server non-volatilestorage 2518 are thus identical to the processor 2504, the memory 2506and the non-volatile storage 2508 respectively. The functions of otherparts are as has been known in the art.

Modifications to embodiments of the present disclosure described in theforegoing are possible without departing from the scope of the presentdisclosure as defined by the accompanying claims. Expressions such as“including”, “comprising”, “incorporating”, “have”, “is” used todescribe and claim the present disclosure are intended to be construedin a non-exclusive manner, namely allowing for items, components orelements not explicitly described also to be present. Reference to thesingular is also to be construed to relate to the plural.

1. A system for low-latency communication from a first device to asecond device over an unreliable network using at least one predictivemachine learning model, wherein the system, when in operation:represents at least one frame of time series data at the first device,wherein the time series data is a series of data points indexed in timeorder; records at least one output stream, a metadata associated withthe at least one output stream, and a plurality of external inputs fromthe first device in an interaction recorder of the second device,wherein the at least one output stream comprises the at least one frameof time series data; segments an image into at least one background areastream and at least one character-centered portion, wherein the at leastone background area stream is captured from a plurality of users;compresses the at least one character-centered portion of the image intoa character focus stream; detects, at the second device, at least onelost frame of time series data using a frame of time series data frompreviously received frames; trains the at least one predictive machinelearning model at the first device for a predictive frame regenerationby providing the at least one output stream from the interactionrecorder as an input; transmits results of the training or interactions,between the first device and the second device, in a time series to thesecond device, from the first device; regenerates the at least one lostframe of time series data, at the second device, using the at least onepredictive machine learning model based on the at least one outputstream to obtain at least one regenerated frame of time series data; andcombines an output stream from an application stream with the at leastone regenerated frame of time series data obtained from the at least onepredictive machine learning model, at the second device to obtain amodified output stream, wherein the application stream comprises the atleast one frame of time series data.
 2. A system for low-latencycommunication from a first device to a second device over an unreliablenetwork as claimed in claim 1, wherein the results or interactions inthe time series comprises a state space representation or the modifiedoutput stream of the at least one frame of time series data, wherein thestate space representation comprises interactions between the firstdevice and the second device.
 3. A system for low-latency communicationfrom a first device to a second device over an unreliable network asclaimed in claim 1, wherein training of the at least one predictivemachine learning model comprises generating a plurality of predictivemachine learning models based on a number of frames in a sequence andthe second device computing capability.
 4. A system for low-latencycommunication from a first device to a second device over an unreliablenetwork as claimed in claim 3, the plurality of predictive machinelearning models comprises a stream source classification model, whereinthe stream source classification model is selected by identifying the atleast one predictive machine learning model to be used when an input isnot tagged as a particular type.
 5. A system for low-latencycommunication from a first device to a second device over an unreliablenetwork as claimed in claim 2, wherein the system, when in operation,provides the state space representation and the interaction between thefirst device and the second device as an input for training the at leastone predictive machine learning model and generates a plurality ofpredictive machine learning model based on the input.
 6. A system forlow-latency communication from a first device to a second device over anunreliable network as claimed in claim 1, wherein the system, when inoperation, selects a suitable predictive machine learning model for thepredictive frame regeneration based on the second device's computingcapability and a quality of the at least one regenerated frame of timeseries data.
 7. A system for low-latency communication from a firstdevice to a second device over an unreliable network as claimed in claim1, wherein the predictive frame regeneration comprises the at least onebackground area stream and the character focus stream.
 8. A system forlow-latency communication from a first device to a second device over anunreliable network as claimed in claim 1, wherein the at least one lostframe of the time series data is detected using a frame loss indicator.9. A system for low-latency communication from a first device to asecond device over an unreliable network as claimed in claim 1, whereinthe system, when in operation, detects a packet lost in the at least oneframe of time series data by a packet sequence number or by using a meanor a median an inter-arrival time.
 10. A system for low-latencycommunication from a first device to a second device over an unreliablenetwork as claimed in claim 1, wherein the system, when in operation,calibrates an acoustic model with a decoder, wherein the acoustic modelenables the decoder to regenerate the at least one lost frame of thetime series data from a lost data.
 11. A system for low-latencycommunication from a first device to a second device over an unreliablenetwork as claimed in claim 1, wherein the system, when in operation,produces fill-frames using a different number of input frames as aninput vector to the at least one predictive machine learning model togenerate an output frame, wherein the output frame is of a differentframe size.
 12. A system for low-latency communication from a firstdevice to a second device over an unreliable network as claimed in claim11, wherein the fill frames in a frame queue are replaced by actualframes that arrive later for improving an accuracy of subsequent framesto be generated using real-time time series data.
 13. A system forlow-latency communication from a first device to a second device over anunreliable network as claimed in claim 1, wherein the system, when inoperation, trains the at least one predictive machine learning modelwith specific stream content.
 14. A system for low-latency communicationfrom a first device to a second device over an unreliable network asclaimed in claim 13, wherein the system when in operation, associates abundle model with the specific stream content, different input framesizes and output frames into one package based on the second device'scomputing capability.
 15. A system for low-latency communication from afirst device to a second device over an unreliable network as claimed inclaim 1, wherein the system, when in operation, generates a confidencescore of a quality of the at least one regenerated frame of time seriesdata regenerated by the at least one predictive machine learning model,wherein the at least one regenerated frame of time series data with highconfidence score beyond a specified confidence threshold is reused in asubsequent fill-frame generation.
 16. A cloud game interactive systemfor low-latency communication from a game execution system and a videostream decoder over an unreliable network using at least one predictivemachine learning model, wherein the game execution system and the videostream decoder comprises one or more processors, one or morenon-transitory computer-readable mediums storing one or more sequencesof instructions, which when executed by the one or more processors,cause: representing at least one frame of time series data at the gameexecution system, wherein the at least one frame of time series data isa series of data points indexed in time order; recording at least oneoutput stream, a metadata associated with the at least one outputstream, and a plurality of external inputs from the game executionsystem in an interaction recorder of the video stream decoder, whereinthe at least one output stream comprises the at least one frame of timeseries data; segmenting an image into at least one background areastream and at least one character-centered portion, wherein the at leastone background area stream is captured from a plurality of users;compressing the at least one character-centered portion of the imageinto a character focus stream; detecting, at the video stream decoder,at least one lost frame of time series data using a frame of time seriesdata from previously received frames; training the at least onepredictive machine learning model at the game execution system for apredictive frame regeneration by providing the at least one outputstream from the interaction recorder or an instantaneous applicationstate, comprising interactive application state-variables, as an input,wherein the interactive application state-variables describe a statespace representation of a closed-loop control defined by a user'sinteraction with an application that drives creation of the audio andvideo; transmitting results of the training or interactions, between thegame execution system and the video stream decoder, in a time series tothe video stream decoder, from the game execution system; regeneratingthe at least one lost frame of time series data, at video streamdecoder, using the at least one predictive machine learning model basedon the at least one output stream to obtain at least one regeneratedframe of time series data; and combining an output stream from anapplication stream with the at least one regenerated frame of timeseries data obtained from the at least one predictive machine learningmodel, at the video stream decoder to obtain a modified output stream,wherein the application stream comprises the at least one frame of timeseries data.
 17. An adaptive model selection system, for low-latencycommunication from a first device to a second device over an unreliablenetwork using at least one predictive machine learning model, whereinthe first device and the second device comprises one or more processors,one or more non-transitory computer-readable mediums storing one or moresequences of instructions, which when executed by the one or moreprocessors, cause: representing at least one frame of time series dataat the first device, wherein the at least one frame of time series datais a series of data points indexed in time order; recording at least oneoutput stream, a metadata associated with the at least one outputstream, and a plurality of external inputs from the first device in aninteraction recorder of the second device, wherein the at least oneoutput stream comprises the at least one frame of time series data;segmenting an image into at least one background area stream and atleast one character-centered portion, wherein the at least onebackground area stream is captured from a plurality of users;compressing the at least one character-centered portion of the imageinto a character focus stream; detecting, at the second device, at leastone lost frame of time series data using a frame of time series datafrom previously received frames; pushing, from the second device to thefirst device, a decoded frame of time series data; selecting, at thefirst device, a best set of predictive model package to obtain bundlemodels, and sending, from the first device to the second device, thebest set of predictive model package; training the at least onepredictive machine learning model, from the best set of predictive modelpackage, at the first device for a predictive frame regeneration byproviding the at least one output stream from the interaction recorderas an input; transmitting results of the training or interactions,between the first device and the second device, in a time series to thesecond device, from the first device; regenerating the at least one lostframe of time series data, at the second device, using the at least onepredictive machine learning model based on the at least one outputstream to obtain at least one regenerated frame of time series data; andcombining an output stream from an application stream with the at leastone regenerated frame of time series data obtained from the at least onepredictive machine learning model, at the second device to obtain amodified output stream, wherein the application stream comprises the atleast one frame of time series data.
 18. An adaptive model selectionsystem, for low-latency communication from a first device to a seconddevice over an unreliable network using at least one predictive machinelearning model as claimed in claim 17, wherein the one or moreprocessors is further configured to train the at least one predictivemachine learning model with specific stream content.
 19. An adaptivemodel selection system, for low-latency communication from a firstdevice to a second device over an unreliable network using wherein atleast one predictive machine learning model as claimed in claim 18,wherein the one or more processors is further configured to calibrate anacoustic model into a decoder, wherein the acoustic model enables thedecoder to regenerate the at least one lost frame of the time seriesdata from lost data.
 20. An adaptive model selection system, forlow-latency communication from a first device to a second device over anunreliable network using wherein at least one predictive machinelearning model as claimed in claim 18, wherein the one or moreprocessors is further configured to produce fill-frames using adifferent number of input frames as an input vector to the at least onepredictive machine learning model to generate an output frame, whereinthe output frame is of a different frame size.
 21. An adaptive modelselection system, for low-latency communication from a first device to asecond device over an unreliable network using at least one predictivemachine learning model as claimed in claim 17, wherein the one or moreprocessors is further configured to generate a confidence score of aquality of the at least one regenerated frame of time series dataregenerated by the at least one predictive machine learning model,wherein the at least one regenerated frame of time series data with highconfidence score beyond a specified confidence threshold is reused in asubsequent fill-frame generation.